Frame(s) in Stata

Data Server Users Meeting

11/16/22

Declaration

I am just a light user of Stata

Frame

Frame is a new feature/function introduced in Stata 16.

  • Arguably one of the most significant improvement in Stata updates.
  • Let’s talk about the concept of frame… or data frame first.

What is Data Frame

  • A data object used to store and present data.

  • It usually contains a two-dimensional, size-mutable, potentially heterogeneous tabular data.

  • There are variations of data frame. For example, depending on the packages used in R, data frame is called data.frame (base R), data.table (data.table) or tibble (tidyverse). In Python, it is DataFrame (pandas).

  • Even an Excel sheet is a data frame with limited features.

  • In Stata, it is frame(s).

What Data Frame does

Ideally, with a data frame, a user can:

  • keep variables, values, and other data properties
  • have flexible ways to access all or a part of data
  • apply function(s) to all or a part of data
  • communicate data between different data frames

Before Stata version 16

What will you do

when you need to check some data and do some analyses, but the needed data are stored in two or more different files?

  1. load the first data set → do something → save & close → load the second data set → do something → ……
  2. open N Stata program windows to load N data sets
  3. merge all data sets together and then torture the data, your computer, and yourself
  4. above all

Frame(s) in Stata 16+

  • The main functionality of frame in Stata is to enable loading and processing multiple data sets within a single Stata program window.

  • Data in different data sets can communication to each other in certain ways.

  • More flexibility of saving data processing and analyzing results.

User Interface

There is a new attribute Frame: {Name} in the Data Properties.

Stata_main_window

Stata_data_property

The default frame name is default.

Frame 101: dir/list, create & rename

All commands start with frame (or frames)

* List all frame(s)
frame dir 
* or
frame list

* these also work 
frames dir
frames list
* Create a new frame
frame create new_frame


* Rename a frame
frame rename old_name new_name
frame dir                 // check available frames
frame create new          // create a new frame "new"
frame rename default old  // rename default frame to "old"
frame list                // check available frames again

Check Working/Current Frame

  • When multiple frames exist, any command will apply to the current working frame if frame prefix is not used.

  • How to check the current frame?

    • See the Data Properties window
    • Use command frame or pwf (print working frame) (remember pwd?)

Switch between Frames

  • If we want to change from the current frame old to the new frame, use frame change new.

  • or cwf new (change working frame) (remember cd?)

Delete/Remove Frame

One or ALL, there is no some

can’t drop a number of frames at a time

Drop one frame at a time

frame dir
* drop the "new" frame
frame drop new
frame dir

Drop ALL frames

frame dir
frame reset //"frames rest" also work
* or
clear frames //"celar frame" doesn't 
frame dir

Work with Frame

After create a new frame, there are two ways to interact with the frame.

Switch to the desired frame.

* create a new frame "apple"
frame create apple

* switch to the new frame
frame change apple //or cwf apple

* do some regular Stata things
webuse apple
summarize 


Work with Frame

Use frame prefix command: frame framename:

* switch back to the default frame
cwf default

frame apple: oneway weight treatment, sidak


Multiple commands with frame prefix in DO file

* within Do file
frame apple {                              // use curly bracket, no colon
  gen new_treatment = treatment -1 
  gen weight_log10 = log10(weight)
  graph hbox weight_log10, over(new_treatment)
}

Frame for Relational Data Sets

  • If two frames are relational data sets, they can be linked using frlink (frame link?) command.

  • Linking two frames is different from merging them together.

  • The two frames have to be in one-to-one (1:1) or many-to-one (m:1) relation.

  • 🙅 one-to-many (1:m) relation is not allowed in frlink

Frame for Relational Data Sets

  • Let frame one and two have the same variable key, and they have a 1:1 relation.

  • When one is the current working frame:
    frlink 1:1 key, frame(two)

  • frlink creates a new variable two in the current working frame as the relation indicator.

  • The indicator variable’s name can be manually set:
    frlink 1:1 key, frame(two) gen(flag)

  • demo

Frame for Relational Data Sets

2. Retrieve (merge) variable(s) from the linked frame

  • After linking two frames, variable(s) in the linked frame can be retrieved/merged into the current working frame using frget (frame get?) command.

  • Basic: frget var1 var2, from(two)

  • Advanced: frget new1=var1, new2=var2, from(two)

  • demo

Frame for Relational Data Sets

3. Direct access/use variable in another frame

  • Using frval (frame variable?) to access/use a variable in the linked frame without merging it into the current working frame.

  • This feature is useful in analyzing multi-level data (e.g., student-school-district)

  • demo

Save Results to Antoher Frame

  • frame post lets you store analyzing results to a non-working frame.

  • Basic: frame post newframe (exp1) (exp2) (exp3)

  • What happens?

  • the values of (exp1), (exp2), and (exp3) are sent to and stored in newframe.

  • ⚠️ In this case, there must be three variables pre-created in newframe

  • demo

Now you know how to make use frame post……right?



No worries!

Life will find a way.

Save Results to a Nonexistent Frame!!

  • frame post is too complicated and not handy.

  • There are always awesome people in the world!

  • Elwyn Davies wrote a function framesave to make things eaiser.

  • The easy way:
    framesave newframe: (do some Stata things)

  • No frame create and frame post are required! 🎉

  • demo

Takeaway

Frame(s) in Stata:

(personal opinion)

  • The functionality of processing multiple data sets simultaneously.

  • An alternative way to link/merge different data sets.

  • A convenient way to store and manage analyzing results.

  • Others (search online to find more use cases.)

Limitations

(personal opinion)

  • Inconvenient in checking frames and their data.

  • Commands are relatively complicated, tedious, and sometimes inconsistent.